knitr::opts_chunk$set(echo = TRUE)
En este trabajo hago un anĂ¡lisis exploratorio en R de una tabla que contiene datos de programas en DS (Ciencia de Datos) en USA.
La descarga de la tabla timesMergedData.csv se hace aquĂ: https://www.kaggle.com/sriharirao/datascience-universities-across-us/data.
Primero cargo timesMergedData.csv en dsp.
dsp <- read.csv('timesMergedData.csv')
El data frame contiene dim(dsp) renglones y columnas.
dim(dsp)
## [1] 954 27
La metadata de la tabla no se encuentra disponible en la web. Por eso, en la tabla abajo incluyo una descripciĂ³n del contenido de las columnas y uso signo de interrogaciĂ³n ? cuando no encontrĂ© sentido en el contenido de alguna columna.
| Columna | Tipo | DescripciĂ³n |
|---|---|---|
| SCHOOL | String | Escuela |
| STATE | String | Estado |
| CITY | String | Ciudad |
| NOC | Numeric | ? |
| PROGRAM | String | Programa |
| TYPE | String | : ‘C’ - Certificate, ‘M’ - Master |
| DEPARTMENT | String | Departamento |
| DELIVERY | String | Campus, en linea o hibrido |
| DURATION | String | Duracion |
| PREREQ | String | Prerequisitos |
| LINK | String | Link |
| LOC_LAT | Numeric | Latitud |
| LOC_LONG | Numeric | Longitud |
| WORLD_RANK | Numeric | Ranking Mundial |
| COUNTRY | String | USA |
| TEACHING | Numeric | ? |
| INTERNATIONAL | Numeric | ? |
| RESEARCH | Numeric | ? |
| CITATIONS | Numeric | ? |
| INCOME | Numeric | ? |
| TOTAL_SCORE | Numeric | ? |
| NUM_STUDENTS | Numeric | No. de estudiantes |
| STUDENT_STAFF_RATIO | Numeric | ? |
| INTERNATIONAL_STUDENTS | String | Porcentaje de estudiantes extranjeros |
| F_M_RATIO | String | ? |
| YEAR | Numeric | año |
| timesData | Numeric | ? |
Uso str(dsp) para ver la estructura del data frame.
str(dsp)
## 'data.frame': 954 obs. of 27 variables:
## $ SCHOOL : Factor w/ 219 levels "Albright College",..: 1 2 3 3 4 4 4 4 4 4 ...
## $ STATE : Factor w/ 40 levels "Alabama","Arizona",..: 32 5 7 7 2 2 2 2 2 2 ...
## $ CITY : Factor w/ 173 levels "Adelphi","Albuquerque",..: 127 11 164 164 155 155 155 155 155 155 ...
## $ NOC : int 1 1 2 2 1 1 1 1 1 1 ...
## $ PROGRAM : Factor w/ 312 levels "Advanced Certificate in Applied Statistics",..: 91 176 274 179 198 198 198 198 198 198 ...
## $ TYPE : Factor w/ 2 levels "C","M": 2 2 2 2 2 2 2 2 2 2 ...
## $ DEPARTMENT : Factor w/ 274 levels "Adult and Graduate Studies",..: 113 190 161 161 269 269 269 269 269 269 ...
## $ DELIVERY : Factor w/ 14 levels "Blended","Campus",..: 9 9 9 9 11 11 11 11 11 11 ...
## $ DURATION : Factor w/ 189 levels "1 Year","1 semester",..: 70 114 35 90 181 181 181 181 181 181 ...
## $ PREREQ : Factor w/ 31 levels "Not Available",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LINK : Factor w/ 372 levels "dead link - program appears to no longer be offered",..: 153 155 337 154 339 339 339 339 339 339 ...
## $ LOC_LAT : num 40.4 39.7 38.9 38.9 33.4 ...
## $ LOC_LONG : num -75.9 -104.8 -77.1 -77.1 -111.9 ...
## $ WORLD_RANK : Factor w/ 138 levels "1","10","102",..: NA NA 92 92 30 38 32 19 53 56 ...
## $ COUNTRY : Factor w/ 1 level "United States of America": NA NA 1 1 1 1 1 1 1 1 ...
## $ TEACHING : num NA NA 42.2 42.2 33.8 43 38.4 38.2 35.7 32.4 ...
## $ INTERNATIONAL : num NA NA 28.9 28.9 28.6 24.1 27.4 26.1 29.5 31.9 ...
## $ RESEARCH : num NA NA 16.5 16.5 35.9 44.1 45.2 39 37.5 38.1 ...
## $ CITATIONS : num NA NA 41.1 41.1 83.6 66.9 79.9 80.3 73.1 84.6 ...
## $ INCOME : Factor w/ 143 levels "-","100","24.2",..: NA NA 58 58 32 1 34 12 39 35 ...
## $ TOTAL_SCORE : Factor w/ 160 levels "-","44.8","45",..: NA NA 1 1 22 30 34 27 14 26 ...
## $ NUM_STUDENTS : Factor w/ 58 levels "10,646","10,788",..: NA NA 4 4 56 56 56 56 56 56 ...
## $ STUDENT_STAFF_RATIO : num NA NA 12 12 29.9 29.9 29.9 29.9 29.9 29.9 ...
## $ INTERNATIONAL_STUDENTS: Factor w/ 25 levels "10%","11%","12%",..: NA NA 3 3 25 25 25 25 25 25 ...
## $ F_M_RATIO : Factor w/ 27 levels "","1.011805556",..: NA NA 25 25 15 15 15 15 15 15 ...
## $ YEAR : int NA NA 2016 2016 2014 2011 2013 2012 2015 2016 ...
## $ timesData : int 0 0 1 1 1 1 1 1 1 1 ...
Elimino las columnas que no voy a utilizar.
dsp <- dsp[, c('SCHOOL',
'STATE',
'CITY',
'PROGRAM',
'TYPE',
'DEPARTMENT',
'DELIVERY',
'LINK',
'LOC_LAT',
'LOC_LONG',
'NUM_STUDENTS',
'INTERNATIONAL_STUDENTS',
'YEAR')]
Las columnas en dsp contienen bĂ¡sicamente variables categĂ³ricas, por eso para el ‘data vis’ que muestro abajo utilizo muchas grĂ¡ficas de barras. La liga http://uc-r.github.io/barcharts es una excelente fuente para producir distintos tipos de grĂ¡ficas de barras en R usando dplyr y ggplot2.
Para los mapas utilizo las paqueterĂas leaflet y rgdal. La liga que contiene el shape file .shp con los estados de USA es esta https://www.census.gov/geo/maps-data/data/cbf/cbf_state.html.
Primero cargo las paqueterĂas necesarias.
require(magrittr, quietly = TRUE, warn.conflicts = FALSE)
require(dplyr, quietly = TRUE, warn.conflicts = FALSE)
require(tidyr, quietly = TRUE, warn.conflicts = FALSE)
require(ggplot2, quietly = TRUE, warn.conflicts = FALSE)
require(leaflet, quietly = TRUE, warn.conflicts = FALSE)
require(rgdal, quietly = TRUE, warn.conflicts = FALSE)
## rgdal: version: 1.2-8, (SVN revision 663)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 2.1.2, released 2016/10/24
## Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.3/Resources/library/rgdal/gdal
## Loaded PROJ.4 runtime: Rel. 4.9.1, 04 March 2015, [PJ_VERSION: 491]
## Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.3/Resources/library/rgdal/proj
## Linking to sp version: 1.2-5
require(ngram, quietly = TRUE, warn.conflicts = FALSE)
require(rgdal, quietly = TRUE, warn.conflicts = FALSE)
require(leaflet, quietly = TRUE, warn.conflicts = FALSE)
CuĂ¡ntas registros hay para cada año YEAR?
summary(as.factor(dsp$YEAR))
## 2011 2012 2013 2014 2015 2016 NA's
## 87 114 110 112 115 134 282
Hacia adelante utilizo solo: 2016’s y NA’s.
dsp %<>% filter(YEAR == 2016 | is.na(YEAR))
Me quedo con observaciones unicas
dsp %<>% distinct()
Las categorĂas de formato DELIVERY son:
#summary(dsp$DELIVERY)
dsp %>% group_by(DELIVERY) %>% summarize(n=n())
## # A tibble: 14 x 2
## DELIVERY n
## <fctr> <int>
## 1 Blended 1
## 2 Campus 246
## 3 Campus and online 1
## 4 Campus or Online 12
## 5 Campus or online 3
## 6 Campus, Online 1
## 7 Hybrid 8
## 8 On Campus 1
## 9 Online 134
## 10 Online (one Saturday per month on-campus) 1
## 11 Online or Campus 2
## 12 Online or On Campus 3
## 13 Online or campus 1
## 14 Online, campus, or hybrid 1
Limpio las categorĂas innecesarias en DELIVERY y creo una nueva columna DELIVERY2.
dsp %<>%
mutate(DELIVERY2 = recode(DELIVERY,
'Blended' = 'Hybrid',
'Campus and online' = 'Hybrid',
'Campus or Online' = 'Campus or online',
'Campus, Online' = 'Campus or online',
'On Campus' = 'Campus',
'Online (one Saturday per month on-campus)' = 'Hybrid',
'Online or Campus' = 'Campus or online',
'Online or On Campus' = 'Campus or online',
'Online or campus' = 'Campus or online',
'Online, campus, or hybrid' = 'Campus or online'))
Las categorĂas de DELIVERY2 son:
dsp %>% group_by(DELIVERY2) %>% summarize(n=n())
## # A tibble: 4 x 2
## DELIVERY2 n
## <fctr> <int>
## 1 Hybrid 11
## 2 Campus 247
## 3 Campus or online 23
## 4 Online 134
# uso x = 'FORMATO'
ggplot(dsp, aes(x = 'FORMATO', fill = DELIVERY2)) +
geom_bar(position = position_stack(), colour = 'grey', alpha = 0.7, width = .5) +
labs(title = "numero de programas DS por DELIVERY2", x = "", y = "")
CuĂ¡l escuela SCHOOL tiene mĂ¡s programas?
dsp %>% group_by(SCHOOL, STATE) %>% tally() %>% arrange(desc(n)) %>% filter(n > 2)
## # A tibble: 53 x 3
## # Groups: SCHOOL [53]
## SCHOOL STATE n
## <fctr> <fctr> <int>
## 1 Bentley University Massachusetts 9
## 2 New York University New York 9
## 3 Boston University Massachusetts 6
## 4 Indiana University Bloomington Indiana 6
## 5 Stanford University California 6
## 6 George Washington University District of Columbia 5
## 7 Johns Hopkins University Maryland 5
## 8 Northeastern University Massachusetts 5
## 9 Northwestern University Illinois 5
## 10 Rutgers University New Jersey 5
## # ... with 43 more rows
#geom_col() en lugar de geom_bar(stats = 'identity')
ggplot(dsp %>%
group_by(SCHOOL, STATE) %>%
tally() %>%
arrange(desc(n)) %>%
filter(n > 2),
aes(x = reorder(SCHOOL, n), y = n, fill = n)) +
geom_col(colour = 'grey', alpha = 0.7) +
scale_fill_gradient(high = 'orchid4', low = 'orchid') +
coord_flip() +
labs(title = "numero de programas DS por SCHOOL", x = "SCHOOL", y = "count") +
theme(legend.position = 'none') +
scale_y_continuous(breaks = 0:15)
Veo las escuelas con mayor nĂºmero de programas y su forma de impartirse DELIVERY2.
ggplot(dsp %>%
group_by(SCHOOL, STATE, DELIVERY2) %>%
tally() %>%
arrange(desc(n)) %>%
filter(n > 2),
aes(x = reorder(SCHOOL, n), y = n, fill = DELIVERY2)) +
geom_col(colour = 'grey', alpha = 0.7) +
coord_flip() +
labs(title = "numero de programas DS por SCHOOL", x = "SCHOOL", y = "count") +
scale_y_continuous(breaks = 0:15)
CuĂ¡ntos programas existen por estado STATE?
ggplot(dsp, aes(STATE)) +
geom_bar(fill = 'skyblue', colour = 'grey', alpha = .5) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position = 'none') +
labs(title = "numero de programas DS por STATE", x = "STATE", y = "count")
Separo los programas por estado STATE y forma de impartirse DELIVERY.
ggplot(dsp, aes(STATE, fill = DELIVERY2)) +
geom_bar(colour = 'grey', alpha = 0.7) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
labs(title = "numero de programas DS por STATE", x = "STATE", y = "count")
Veo en un mapa la densidad de programas que se imparten en campus.
map <- readOGR(dsn = "./cb_2016_us_state_500k", layer = "cb_2016_us_state_500k", encoding = "UTF-8")
## OGR data source with driver: ESRI Shapefile
## Source: "./cb_2016_us_state_500k", layer: "cb_2016_us_state_500k"
## with 56 features
## It has 9 fields
## Integer64 fields read as strings: ALAND AWATER
map <- map[!map@data$NAME %in% c('Alaska',
'Hawaii',
'Puerto Rico',
'Guam',
'United States Virgin Islands',
'Commonwealth of the Northern Mariana Islands',
'American Samoa'), ]
count = dsp %>% filter(DELIVERY2 == 'Campus') %>% group_by(STATE) %>% summarise(count_STATE = n())
map@data$count_STATE = count$count_STATE[match(map@data$NAME, count$STATE)]
pal <- colorNumeric("Reds", c(0, max(map@data$count_STATE, na.rm = TRUE)))
banner <- paste("<strong>State: </strong>",
map@data$NAME,
"<br>DS-programs: ",
map@data$count_STATE)
leaflet(data = map) %>%
addTiles() %>%
addPolygons(fillOpacity = 0.8,
smoothFactor = 0.5,
color = ~pal(count_STATE),
popup = banner) %>%
addLegend("bottomright",
values = ~count_STATE,
pal = pal) %>%
addPolylines(color = "red")
QuĂ© ciudad CITY tiene mĂ¡s programas?
dsp %>% group_by(CITY, STATE) %>% tally() %>% arrange(desc(n))
## # A tibble: 180 x 3
## # Groups: CITY [172]
## CITY STATE n
## <fctr> <fctr> <int>
## 1 New York New York 17
## 2 Boston Massachusetts 13
## 3 Chicago Illinois 12
## 4 Waltham Massachusetts 11
## 5 Washington District of Columbia 9
## 6 Baltimore Maryland 8
## 7 Denver Colorado 8
## 8 Philadelphia Pennsylvania 7
## 9 Rochester New York 7
## 10 Bloomington Indiana 6
## # ... with 170 more rows
ggplot(dsp %>%
group_by(CITY, STATE) %>%
tally() %>%
arrange(desc(n)) %>%
filter(n > 2),
aes(x = reorder(CITY, n), y = n)) +
geom_bar(stat = 'identity', fill = 'tomato', colour = 'grey', alpha = 0.5) +
coord_flip() +
geom_text(aes(label = n), nudge_y = 1, color = 'tomato', size = 2.5) +
labs(title = "numero de DS programas por CITY", x = "CITY", y = "count")
Veo las ciudades con mayor nĂºmero de programas y su forma de impartirse DELIVERY2.
ggplot(dsp %>%
group_by(CITY, STATE, DELIVERY2) %>%
tally() %>%
arrange(desc(n)) %>%
filter(n > 2),
aes(x = reorder(CITY, n), y = n, fill = DELIVERY2)) +
geom_bar(stat = 'identity', colour = 'grey', alpha = 0.7) +
coord_flip() +
labs(title = "numero de DS programas por CITY", x = "CITY", y = "count")
Las categorĂas de PROGRAM son:
#summary(dsp$PROGRAM)
dsp %>% group_by(PROGRAM) %>% summarize(n=n()) %>% arrange(desc(n))
## # A tibble: 311 x 2
## PROGRAM n
## <fctr> <int>
## 1 Master of Science in Business Analytics 24
## 2 Master of Science in Analytics 10
## 3 Graduate Certificate in Business Analytics 9
## 4 Master of Science in Data Science 8
## 5 Master of Science in Applied Statistics 7
## 6 Master of Science in Information Systems 7
## 7 Master of Science in Data Analytics 5
## 8 Master of Science in Health Informatics 4
## 9 Online Master of Science in Data Science 4
## 10 Certificate in Data Science 3
## # ... with 301 more rows
Creo una nueva columna PROGRAM2 usando palabras clave en PROGRAM.
# \\b significa 'word boundary'
PROGRAM2 <- as.character(dsp$PROGRAM)
PROGRAM2 %<>%
tolower() %>%
gsub('\\.|\\:|\\,', '', .) %>%
gsub('\\&', 'and', .) %>%
gsub('\\(|\\)', '', .) %>%
gsub('master\'s', 'master', .) %>%
gsub('masters', 'master', .) %>%
gsub('\\bms\\b', '*M* *MS* *Sc*', .) %>%
gsub('master of science', '*M* *MS* *Sc*', .) %>%
gsub('\\bmba\\b', '*M* *MBA* *B*', .) %>%
gsub('master of business administration', '*M* *MBA* *B*', .) %>%
gsub('master of business and science', '*M* *MBS* *B* *Sc*', .) %>%
gsub('master', '*M*', .) %>%
gsub('diploma', '*Cert*', .) %>%
gsub('certificate', '*Cert*', .) %>%
gsub('doctor', '*PhD*', .) %>%
gsub('phd', '*PhD*', .) %>%
gsub('\\bds\\b', '*DS*', .) %>%
gsub('computational data science', '*DS* *CS* *CDS*', .) %>%
gsub('computational and data science', '*DS* *CS* *CDS*', .) %>%
gsub('data science', '*DS*', .) %>%
gsub('computer science', '*CS*', .) %>%
gsub('computational science', '*CS*', .) %>%
gsub('business analytics', '*BI* *B* *Analytics*', .) %>%
gsub('\\bbi\\b', '*BI* *B* *Analytics*', .) %>%
gsub('business intelligence', '*BI* *B* *Analytics*', .) %>%
gsub('data mining', 'mining', .) %>%
gsub('mining', '*Analytics*', .) %>%
gsub('applied statistics', 'statistics', .) %>%
gsub('statistical', 'statistics', .) %>%
gsub('statistics', '*Stats*', .) %>%
gsub('informatics', 'analytics', .) %>%
gsub('data analytics', 'analytics', .) %>%
gsub('analytics', '*Analytics*', .) %>%
gsub('information systems technology', '*IS* *IT*', .) %>%
gsub('management information systems', '*IM* *IS*', .) %>%
gsub('information management', 'im', .) %>%
gsub('information systems', 'is', .) %>%
gsub('\\bim\\b', '*IM*', .) %>%
gsub('\\bis\\b', '*IS*', .) %>%
gsub('information technology', 'it', .) %>%
gsub('\\bit\\b', '*IT*', .) %>%
gsub('health', '*Health-Bio*', .) %>%
gsub('bio', '*Health-Bio*', .) %>%
gsub('urban', '*Urban*', .) %>%
gsub('public', '*Public*', .) %>%
gsub('\\bin\\b|\\band\\b|\\bof\\b|\\bwith\\b|\\ba\\b|\\bfor\\b|\\bthe\\b|\\bat\\b', '', .) %>%
gsub('^a |^the ', '', .)
dsp$PROGRAM2 <- PROGRAM2
words <- dsp %>% select(PROGRAM, PROGRAM2)
write.csv(words, 'words.csv')
#words <- unlist(strsplit(PROGRAM2," "))
#words <- as.data.frame(table(words)) %>% arrange(desc(Freq))
#write.csv(words, 'words.csv')
De quĂ© tipo TYPE son los programas? CuĂ¡ntos programas son 'C' Certificates y cuĂ¡ntos son 'M' Masters?
ggplot(dsp, aes(TYPE, fill = TYPE)) +
geom_bar() +
geom_text(stat = 'count', aes(label = ..count.., y = ..count..), vjust = 1.5)
Genero una nueva columna TYPE2
dsp$TYPE2 <- NA
dsp$PROG <- NA
dsp$TYPE2[grep("*M*", dsp$PROGRAM2)] <- 'M'
dsp$PROG[grep("*MBA*", dsp$PROGRAM2)] <- 'MBA'
dsp$PROG[grep("MBS", dsp$PROGRAM2)] <- 'MBS'
dsp$PROG[grep("MS", dsp$PROGRAM2)] <- 'MS'
dsp[grep("*PhD*", dsp$PROGRAM2), c('TYPE2','PROG')] <- 'PhD'
dsp[grep("*Cert*", dsp$PROGRAM2), c('TYPE2','PROG')] <- 'Cert'
dsp %>% group_by(TYPE2, PROG) %>% tally()
## # A tibble: 6 x 3
## # Groups: TYPE2 [?]
## TYPE2 PROG n
## <chr> <chr> <int>
## 1 Cert Cert 99
## 2 M MBA 31
## 3 M MBS 2
## 4 M MS 219
## 5 M <NA> 53
## 6 PhD PhD 11
dsp[is.na(dsp$PROG),c('PROGRAM','PROGRAM2')]
## PROGRAM
## 9 Professional Science Master's in Data Management and Analysis
## 10 Professional Science Master's Degree in Predictive Analytics
## 15 Master of Professional Science in Technology Innovation with Focus in Bioinformatics
## 22 Master of Business Analytics
## 24 Masters of Information Technology
## 35 Master of Arts in Computational Linguistics
## 48 Master of Information Systems Management, Business Intelligence and Data Analytics (MISM-BIDA)
## 49 Master of Computational Data Science (MCDS)
## 50 MSM - Business Analytics
## 62 Master of Applied Statistics (M.A.S.)
## 68 Master of Professional Studies (MPS) in Applied Statistics (Option II: Data Science)
## 71 MA in Data Analytics & Applied Social Research
## 76 Online Master's in Health Informatics
## 79 Master of Quantitative Management
## 80 Master of Arts in Applied Statistics
## 82 Master in Health Informatics
## 87 Master's in Data Analytics
## 92 Masters in Information Systems
## 112 Master of Data Science
## 141 Master of Business Analytics
## 149 Health Informatics Master's Program
## 153 Master in Data Science
## 168 Master of Professional Studies in Informatics
## 183 Master of Applied Statistics
## 184 Master of Public Health in Biomedical Informatics
## 193 Master of Professional Studies in Data Analytics
## 194 Master of Professional Studies in Data Analytics
## 195 Master of Applied Statistics
## 215 Master of Information
## 220 Online Master's in Health Administration: Informatics Specialization
## 221 Applied Analytics Master's Degree
## 254 Professional Science Master's Degree in Environmental Informatics
## 266 Master of Business Analytics
## 284 Online Master's in Management Information Systems
## 285 Master's in Management Information Systems
## 287 Professional Master of Information Systems
## 288 Master of Information Systems with Business Analytics Concentration
## 289 Online Master of Information and Data Science (MIDS)
## 290 Master of Engineering - Concentration in Data Science & Systems
## 291 Master of Information Management and Systems
## 294 Master's of Health Informatics
## 300 Master of Advanced Study in Data Science and Engineering
## 304 Professional Science Master's Program in Health Care Informatics
## 327 Master of Computer Science in Data Science
## 336 Master of Information Management
## 345 Master's Program in Bioinformatics
## 360 Professional Science Master's (PSM) in Data Science and Business Analytics (DSBA)
## 368 Master of Data Science and Analytics
## 371 Master's Degree in Biostatistics
## 372 Master's Degree in Biomedical Informatics
## 374 MSIS - Big Data Analytics
## 380 Master of Applied Statistics
## 410 Master of Professional Studies in Statistical and Data Sciences
## PROGRAM2
## 9 professional science *M* data management analysis
## 10 professional science *M* degree predictive *Analytics*
## 15 *M* professional science technology innovation focus *Health-Bio**Analytics*
## 22 *M* *BI* *B* *Analytics*
## 24 *M* *IT*
## 35 *M* arts computational linguistics
## 48 *M* *IS* management *BI* *B* *Analytics* *Analytics* mism-bida
## 49 *M* *DS* *CS* *CDS* mcds
## 50 msm - *BI* *B* *Analytics*
## 62 *M* *Stats* mas
## 68 *M* professional studies mps *Stats* option ii *DS*
## 71 ma *Analytics* applied social research
## 76 online *M* *Health-Bio* *Analytics*
## 79 *M* quantitative management
## 80 *M* arts *Stats*
## 82 *M* *Health-Bio* *Analytics*
## 87 *M* *Analytics*
## 92 *M* *IS*
## 112 *M* *DS*
## 141 *M* *BI* *B* *Analytics*
## 149 *Health-Bio* *Analytics* *M* program
## 153 *M* *DS*
## 168 *M* professional studies *Analytics*
## 183 *M* *Stats*
## 184 *M* *Public* *Health-Bio* *Health-Bio*medical *Analytics*
## 193 *M* professional studies *Analytics*
## 194 *M* professional studies *Analytics*
## 195 *M* *Stats*
## 215 *M* information
## 220 online *M* *Health-Bio* administration *Analytics* specialization
## 221 applied *Analytics* *M* degree
## 254 professional science *M* degree environmental *Analytics*
## 266 *M* *BI* *B* *Analytics*
## 284 online *M* *IM* *IS*
## 285 *M* *IM* *IS*
## 287 professional *M* *IS*
## 288 *M* *IS* *BI* *B* *Analytics* concentration
## 289 online *M* information *DS* mids
## 290 *M* engineering - concentration *DS* systems
## 291 *M* *IM* systems
## 294 *M* *Health-Bio* *Analytics*
## 300 *M* advanced study *DS* engineering
## 304 professional science *M* program *Health-Bio* care *Analytics*
## 327 *M* *CS* *DS*
## 336 *M* *IM*
## 345 *M* program *Health-Bio**Analytics*
## 360 professional science *M* psm *DS* *BI* *B* *Analytics* dsba
## 368 *M* *DS* *Analytics*
## 371 *M* degree *Health-Bio**Stats*
## 372 *M* degree *Health-Bio*medical *Analytics*
## 374 msis - big *Analytics*
## 380 *M* *Stats*
## 410 *M* professional studies *Stats* *DS*s
Veo qué Doctorados hay:
dsp %>% filter(TYPE2 == 'PhD')
## SCHOOL STATE CITY
## 1 Chapman University California Orange
## 2 Colorado Technical University Colorado Colorado Springs
## 3 Indiana University Bloomington Indiana Bloomington
## 4 Kennesaw State University Georgia Kennesaw
## 5 New York University New York New York
## 6 University of Cincinnati Ohio Cincinnati
## 7 University of Maryland-College Park Maryland College Park
## 8 University of Massachusetts-Boston Massachusetts Boston
## 9 University of Southern California California Los Angeles
## 10 University of Washington-Seattle Campus Washington Seattle
## 11 Worcester Polytechnic Institute Massachusetts Worcester
## PROGRAM
## 1 Doctorate in Computational and Data Sciences
## 2 Doctor of Computer Science - Concentration in Big Data Analytics
## 3 Ph.D. Minor in Data Science
## 4 Ph.D. in Analytics and Data Science
## 5 Ph.D. in Computer Science with Specialization in Visualization, Databases and Big Data
## 6 Doctor of Philosopy in Biostatistics - Big Data Track
## 7 Ph.D. in Information Studies - Concentration in Big Data/Data Science
## 8 Ph.D. in Business Administration - Information Systems for Data Science Track
## 9 Ph.D. in Data Sciences & Operations
## 10 Ph.D. in Big Data and Data Science
## 11 Ph.D. in Data Science
## TYPE DEPARTMENT DELIVERY
## 1 M Schmid College of Science & Technology Campus
## 2 M Computer Science Department Online
## 3 M School of Informatics and Computing Campus or Online
## 4 M College of Science & Mathematics Campus
## 5 M Tandon School of Engineering Campus
## 6 M College Of Medicine Campus
## 7 M College of Information Studies Campus
## 8 M College of Management Campus
## 9 M Marshall School of Business Campus
## 10 M eScience Institute Campus
## 11 M College of Arts & Sciences Campus
## LINK
## 1 http://www.chapman.edu/scst/graduate/phd-computational-science.aspx
## 2 http://www.coloradotech.edu/degrees/doctorates/computer-science/big-data-analytics
## 3 http://www.soic.indiana.edu/graduate/degrees/data-science/graduate/index.html
## 4 https://analytics.kennesaw.edu/academics/grad/MSAS/msas-curr.html
## 5 http://steinhardt.nyu.edu/graduate_admissions/guide/assr/ms
## 6 https://eh.uc.edu/bio/academic-programs/phd-biostatistics/big-data/
## 7 http://ischool.umd.edu/tuition-fees
## 8 https://www.umb.edu/academics/caps/certificates/business_analytics/admission
## 9 http://www.marshall.usc.edu/msanalytics/program_cost
## 10 http://www.pce.uw.edu/certificates/data-science.html
## 11 http://www.wpi.edu/academics/datascience/certificate-program.html
## LOC_LAT LOC_LONG NUM_STUDENTS INTERNATIONAL_STUDENTS YEAR
## 1 33.7937 -117.8510 <NA> <NA> NA
## 2 38.8937 -104.8340 <NA> <NA> NA
## 3 39.1664 -86.5269 <NA> <NA> NA
## 4 34.0363 -84.5808 <NA> <NA> NA
## 5 40.7295 -73.9973 42,056 19% 2016
## 6 39.1312 -84.5143 36,108 6% 2016
## 7 38.9886 -76.9397 <NA> <NA> NA
## 8 42.3145 -71.0387 <NA> <NA> NA
## 9 34.0211 -118.2840 36,534 20% 2016
## 10 47.6562 -122.3130 <NA> <NA> NA
## 11 42.2751 -71.8088 <NA> <NA> NA
## DELIVERY2
## 1 Campus
## 2 Online
## 3 Campus or online
## 4 Campus
## 5 Campus
## 6 Campus
## 7 Campus
## 8 Campus
## 9 Campus
## 10 Campus
## 11 Campus
## PROGRAM2 TYPE2
## 1 *PhD*ate *DS* *CS* *CDS*s PhD
## 2 *PhD* *CS* - concentration big *Analytics* PhD
## 3 *PhD* minor *DS* PhD
## 4 *PhD* *Analytics* *DS* PhD
## 5 *PhD* *CS* specialization visualization databases big data PhD
## 6 *PhD* philosopy *Health-Bio**Stats* - big data track PhD
## 7 *PhD* information studies - concentration big data/*DS* PhD
## 8 *PhD* business administration - *IS* *DS* track PhD
## 9 *PhD* *DS*s operations PhD
## 10 *PhD* big data *DS* PhD
## 11 *PhD* *DS* PhD
## PROG
## 1 PhD
## 2 PhD
## 3 PhD
## 4 PhD
## 5 PhD
## 6 PhD
## 7 PhD
## 8 PhD
## 9 PhD
## 10 PhD
## 11 PhD
map2 <- dsp %>% filter(TYPE2 == 'PhD') %>%
select(LOC_LONG, LOC_LAT, LINK, PROGRAM, SCHOOL, CITY, STATE) %>%
rename(long = LOC_LONG) %>%
rename(lat = LOC_LAT)
leaflet(data = map) %>%
addTiles() %>%
addPolygons(fillOpacity = 0.8,
smoothFactor = 0.5,
color = ~pal(count_STATE)) %>%
addPolylines(color = "red") %>%
addMarkers(data = map2, ~long, ~lat, popup = ~paste(SCHOOL, PROGRAM, LINK, sep=':'), label = ~paste(CITY, STATE, sep = ","))
#leaflet(data = map2) %>% addTiles() %>%
# addCircleMarkers(~long, ~lat, popup = ~paste(SCHOOL, PROGRAM, LINK, sep=':'), label = ~paste(CITY, STATE, sep = ","))
Veo qué maestrias en ciencias existen
map2 <- dsp %>% filter(PROG == 'MS') %>%
select(LOC_LONG, LOC_LAT, LINK, PROGRAM, SCHOOL, CITY, STATE) %>%
rename(long = LOC_LONG) %>%
rename(lat = LOC_LAT)
leaflet(data = map) %>%
addTiles() %>%
addPolygons(fillOpacity = 0.8,
smoothFactor = 0.5,
color = ~pal(count_STATE)) %>%
addPolylines(color = "red") %>%
addMarkers(data = map2, ~long, ~lat, popup = ~paste(SCHOOL, PROGRAM, LINK, sep=':'), label = ~paste(CITY, STATE, sep = ","))
#leaflet(data = map2) %>% addTiles() %>%
# addCircleMarkers(~long, ~lat, popup = ~paste(SCHOOL, PROGRAM, LINK, sep=':'), label = ~paste(CITY, STATE, sep = ","))